7 research outputs found

    Monocular Vision based Crowdsourced 3D Traffic Sign Positioning with Unknown Camera Intrinsics and Distortion Coefficients

    Full text link
    Autonomous vehicles and driver assistance systems utilize maps of 3D semantic landmarks for improved decision making. However, scaling the mapping process as well as regularly updating such maps come with a huge cost. Crowdsourced mapping of these landmarks such as traffic sign positions provides an appealing alternative. The state-of-the-art approaches to crowdsourced mapping use ground truth camera parameters, which may not always be known or may change over time. In this work, we demonstrate an approach to computing 3D traffic sign positions without knowing the camera focal lengths, principal point, and distortion coefficients a priori. We validate our proposed approach on a public dataset of traffic signs in KITTI. Using only a monocular color camera and GPS, we achieve an average single journey relative and absolute positioning accuracy of 0.26 m and 1.38 m, respectively.Comment: Accepted at 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC

    Adversarial Attacks on Monocular Pose Estimation

    Full text link
    Advances in deep learning have resulted in steady progress in computer vision with improved accuracy on tasks such as object detection and semantic segmentation. Nevertheless, deep neural networks are vulnerable to adversarial attacks, thus presenting a challenge in reliable deployment. Two of the prominent tasks in 3D scene-understanding for robotics and advanced drive assistance systems are monocular depth and pose estimation, often learned together in an unsupervised manner. While studies evaluating the impact of adversarial attacks on monocular depth estimation exist, a systematic demonstration and analysis of adversarial perturbations against pose estimation are lacking. We show how additive imperceptible perturbations can not only change predictions to increase the trajectory drift but also catastrophically alter its geometry. We also study the relation between adversarial perturbations targeting monocular depth and pose estimation networks, as well as the transferability of perturbations to other networks with different architectures and losses. Our experiments show how the generated perturbations lead to notable errors in relative rotation and translation predictions and elucidate vulnerabilities of the networks.Comment: Accepted at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022

    Multimodal Scale Consistency and Awareness for Monocular Self-Supervised Depth Estimation

    Full text link
    Dense depth estimation is essential to scene-understanding for autonomous driving. However, recent self-supervised approaches on monocular videos suffer from scale-inconsistency across long sequences. Utilizing data from the ubiquitously copresent global positioning systems (GPS), we tackle this challenge by proposing a dynamically-weighted GPS-to-Scale (g2s) loss to complement the appearance-based losses. We emphasize that the GPS is needed only during the multimodal training, and not at inference. The relative distance between frames captured through the GPS provides a scale signal that is independent of the camera setup and scene distribution, resulting in richer learned feature representations. Through extensive evaluation on multiple datasets, we demonstrate scale-consistent and -aware depth estimation during inference, improving the performance even when training with low-frequency GPS data.Comment: Accepted at 2021 IEEE International Conference on Robotics and Automation (ICRA

    Crowdsourced 3D Mapping: A Combined Multi-View Geometry and Self-Supervised Learning Approach

    Full text link
    The ability to efficiently utilize crowdsourced visual data carries immense potential for the domains of large scale dynamic mapping and autonomous driving. However, state-of-the-art methods for crowdsourced 3D mapping assume prior knowledge of camera intrinsics. In this work, we propose a framework that estimates the 3D positions of semantically meaningful landmarks such as traffic signs without assuming known camera intrinsics, using only monocular color camera and GPS. We utilize multi-view geometry as well as deep learning based self-calibration, depth, and ego-motion estimation for traffic sign positioning, and show that combining their strengths is important for increasing the map coverage. To facilitate research on this task, we construct and make available a KITTI based 3D traffic sign ground truth positioning dataset. Using our proposed framework, we achieve an average single-journey relative and absolute positioning accuracy of 39cm and 1.26m respectively, on this dataset.Comment: Accepted at 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS

    Practical Auto-Calibration for Spatial Scene-Understanding from Crowdsourced Dashcamera Videos

    Full text link
    Spatial scene-understanding, including dense depth and ego-motion estimation, is an important problem in computer vision for autonomous vehicles and advanced driver assistance systems. Thus, it is beneficial to design perception modules that can utilize crowdsourced videos collected from arbitrary vehicular onboard or dashboard cameras. However, the intrinsic parameters corresponding to such cameras are often unknown or change over time. Typical manual calibration approaches require objects such as a chessboard or additional scene-specific information. On the other hand, automatic camera calibration does not have such requirements. Yet, the automatic calibration of dashboard cameras is challenging as forward and planar navigation results in critical motion sequences with reconstruction ambiguities. Structure reconstruction of complete visual-sequences that may contain tens of thousands of images is also computationally untenable. Here, we propose a system for practical monocular onboard camera auto-calibration from crowdsourced videos. We show the effectiveness of our proposed system on the KITTI raw, Oxford RobotCar, and the crowdsourced D2^2-City datasets in varying conditions. Finally, we demonstrate its application for accurate monocular dense depth and ego-motion estimation on uncalibrated videos.Comment: Accepted at 16th International Conference on Computer Vision Theory and Applications (VISAP, 2021

    AI-Driven Road Maintenance Inspection v2: Reducing Data Dependency & Quantifying Road Damage

    Full text link
    Road infrastructure maintenance inspection is typically a labor-intensive and critical task to ensure the safety of all road users. Existing state-of-the-art techniques in Artificial Intelligence (AI) for object detection and segmentation help automate a huge chunk of this task given adequate annotated data. However, annotating videos from scratch is cost-prohibitive. For instance, it can take an annotator several days to annotate a 5-minute video recorded at 30 FPS. Hence, we propose an automated labelling pipeline by leveraging techniques like few-shot learning and out-of-distribution detection to generate labels for road damage detection. In addition, our pipeline includes a risk factor assessment for each damage by instance quantification to prioritize locations for repairs which can lead to optimal deployment of road maintenance machinery. We show that the AI models trained with these techniques can not only generalize better to unseen real-world data with reduced requirement for human annotation but also provide an estimate of maintenance urgency, thereby leading to safer roads.Comment: Accepted at IRF Global R2T Conference & Exhibition 202

    Robot Placement for Mobile Manipulation in Domestic Environments

    No full text
    The development of domestic mobile manipulators for unconstrained environments has driven significant research recently. Robot Care Systems has been pioneering in developing a prototype of a mobile manipulator for elderly care. It has a 6 degrees of freedom robotic arm mounted on their flagship robot LEA, a non-holonomic differential drive platform. In order to utilize the navigation and manipulation capabilities of such mobile manipulators, robot placement algorithm that computes a favorable position and orientation of the mobile base is sought, which enables the end effector to reach a desired target. None of the existing approaches perform robot placement while ensuring a high chance of successful planning to target through a short path, while accounting for sensing and actuation errors typical in real world scenarios. This thesis presents a novel robot placement algorithm DeCOWA (Determining Commutation configuration using Optimization and Workspace Analysis) with these characteristics. Since the approach to robot placement is dependent upon the kind of mobile manipulation, a comparative study of sequential and full body methods is performed with respect to criteria important in domestic settings. Sequential mobile manipulation is found to be most suitable, for which a modular mobile manipulation framework encompassing motion planning and robot placement is presented. With sequential mobile manipulation, the ability to successfully reach a target depends upon the kinematic capabilities of the arm. Accordingly, robot placement with DeCOWA determines a favorable location for the arm, and corresponding platform orientation. To find the position of arm’s base, an offline manipulator workspace analysis is performed generating the Inverse Reachability and Planability maps. During online use, these maps are combined into an Inverse Fusion Map that ranks differentlocations based on the ability of the arm placed there to find a successful and short motion plan to target. This map is filtered to generate a set of feasible locations at the arm’s height. Through a ranked iterative search, a suitable collision free arm location is determined followed by minimization of the platform distance from robot’s current pose. This approach is evaluated against an unbiased random placement of robot near the target using a sample set of twenty scenes mimicking domestic settings. It is found that DeCOWA is able to generate commutation configurations in fraction of a second, that lead to a high planning success rate, a short path length, and account for goal tolerance of navigation. Also, its modularity allows to use several planability metrics, making it useful for domestic application.Mechanical Engineering | Biomechanical Design - BioRobotic
    corecore